Skip to content

Conversation

@philip-paul-mueller
Copy link
Collaborator

@philip-paul-mueller philip-paul-mueller commented Dec 18, 2025

This PR introduces the scheduled exchange feature from GHEX into ICON4Py.

These exchange allows to call the exchange function before all work has been completed, i.e. the exchange will wait until the previous work is done. A similar feature is the "scheduled wait", that allows to initiate the receive without the need to wait on its completion.

In addition to this the function also renamed the functions related to halo exchange:

  • exchange() was renamed to start().
  • wait() was renamed to finish() (that might now return before the transfer has fully concluded).
  • exchange_and_wait() was renamed to exchange().

All of these functions now accepts the an argument called stream, which defaults to DEFAULT_STREAM. It is indicate how synchronization with the stream should be performed.
In case of start() it means that the actual exchange should not start until all work previously submitted to stream has finished. For finish() it means that further work, submitted to stream, should not start until the exchange has ended. For finish() it is also possible to specify BLOCK, which means that finish() waits until the transfer has fully finished.

The orchestrator was not updated, but the change were made in such a way that it continues to work in diffusion, although using the original, blocking behaviour.

Note:
The CI fails for cscs/extra, but it also does this for current main, see See this test PR: #982

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

**NOTE:**
This commit still follows the old nomoclature, where `None` means default stream.
Most likely this will change such that `None` means "not using `schedule_*()` functions and another sigelton is used for it.
@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

- There are now two protocols that describes how to extract the underlying address.
	They are probably at the wrong location.
- `stream=None` no longer means "default stream" but is not equivalent to "do not use scheduled version".
- To indicate the default stream the singelton `DefaultStream` is used.
	The `cupy.cuda.Stream.null` singelton was not used, because it would require that `cupy` is present.
- However, use the default stream is still the default behaviour.
@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

@philip-paul-mueller
Copy link
Collaborator Author

philip-paul-mueller commented Dec 19, 2025

There is a failing in extra, however, this error is also present on main.

See this test PR: #982

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller philip-paul-mueller changed the title [DO NOT MERGE]: Scheduled Halo Exchange Scheduled Halo Exchange Dec 19, 2025
@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run distributed

1 similar comment
@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run distributed

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run distributed

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run distributed

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run distributed

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default
  • cscs-ci run distributed

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

  • cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run distributed

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run default

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run dace

@philip-paul-mueller
Copy link
Collaborator Author

cscs-ci run extra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants